Source: Python for Biologists
In this folder you’ll find a text file called data.csv, containing some made-up data for a number of genes. Each line contains the following fields for a single gene in this order: species name, sequence, gene name, expression level. The fields are separated by commas (hence the name of the file – csv stands for Comma Separated Values). Think of it as a representation of a table in a spreadsheet – each line is a row, and each field in a line is a column. All the exercises for this section use the data read from this file.
In [ ]:
# %load data.csv
Drosophila melanogaster,atatatatatcgcgtatatatacgactatatgcattaattatagcatatcgatatatatatcgatattatatcgcattatacgcgcgtaattatatcgcgtaattacga,kdy647,264
Drosophila melanogaster,actgtgacgtgtactgtacgactatcgatacgtagtactgatcgctactgtaatgcatccatgctgacgtatctaagt,jdg766,185
Drosophila simulans,atcgatcatgtcgatcgatgatgcatccgactatcgtcgatcgtgatcgatcgatcgatcatcgatcgatgtcgatcatgtcgatatcgt,kdy533,485
Drosophila yakuba,cgcgcgctcgcgcatacggcctaatgcgcgcgctagcgatgc,hdt739,85
Drosophila ananassae,ttacgatcgatcgatcgatcgatcgtcgatcgtcgatgctacatcgatcatcatcggattagtcacatcgatcgatcatcgactgatcgtcgatcgtagatgctgacatcgatagca,hdu045,356
Drosophila ananassae,gcatcgatcgatcgcggcgcatcgatcgcgatcatcgatcatacgcgtcatatctatacgtcactgccgcgcgtatctacgcgatgactagctagact,teg436,222
In [6]:
# Look at csv module
import csv
with open('data.csv') as csvfile:
raw_data = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in raw_data:
print(', '.join(row))
In [5]:
# Look at csv module
import csv
with open('data.csv') as csvfile:
raw_data = csv.reader(csvfile)
for row in raw_data:
print(row)
In [7]:
# Look at csv module
import csv
with open('data.csv') as csvfile:
raw_data = csv.reader(csvfile)
for row in raw_data:
if row[0] == 'Drosophila melanogaster' or row[0] == 'Drosophila simulans':
print(row[2])
In [8]:
import csv
with open('data.csv') as csvfile:
raw_data = csv.reader(csvfile)
for row in raw_data:
if row[0] in ['Drosophila melanogaster', 'Drosophila simulans']:
print(row[2])
In [11]:
import csv
with open('data.csv') as csvfile:
raw_data = csv.reader(csvfile)
for row in raw_data:
if len(row[1]) >= 90 or len(row[1]) <= 110:
print(row[2])
In [15]:
def is_at_rich(dna):
length = len(dna)
a_count = dna.upper().count('A')
t_count = dna.upper().count('T')
at_content = (a_count + t_count) / length
return at_content < 0.5
In [16]:
import csv
with open('data.csv') as csvfile:
raw_data = csv.reader(csvfile)
for row in raw_data:
if is_at_rich(row[1]) and int(row[3]) > 200:
print(row[2])
In [20]:
import csv
with open('data.csv') as csvfile:
raw_data = csv.reader(csvfile)
for row in raw_data:
if (row[2].startswith('k') or row[2].startswith('h')) and row[0] != 'Drosophila melanogaster':
print(row[2])
In [21]:
def at_percentage(dna):
length = len(dna)
a_count = dna.upper().count('A')
t_count = dna.upper().count('T')
at_content = (a_count + t_count) / length
return at_content
In [22]:
import csv
with open('data.csv') as csvfile:
raw_data = csv.reader(csvfile)
for row in raw_data:
at_percent = at_percentage(row[1])
if at_percent > 0.65:
print('AT content is high')
elif at_percent < 0.45:
print('AT content is high')
else:
print('AT content is medium')
In [ ]: